Training system and processes for objects to be classified

ABSTRACT

The present disclosure relates to a training system and, more particularly, to a method and system for training objects to be classified and related processes. The processes includes: extracting, using a computing device, features of a plurality of objects; training, using the computing device, a machine learning model with selected ones of the extracted features; building, using the computing device, a final machine learning model of the selected features after all of the plurality of objects for training are captured; and performing, using the computing device, an action on subsequent objects based on the trained final machine learning model.

TECHNICAL FIELD

The present disclosure relates to a training system that can be taughtby an operator and, more particularly, to a method and system fortraining objects to be classified and related processes.

BACKGROUND

Classification models in embedded systems are used in many situations,such as attaching them to robots and machineries, such as in factoriesand distribution centers. The training of these systems is performedoff-site, which requires high computation, and the more images andcomplex models (such as deep learning) used, the more increase incomputation is required. Also, once trained, the system is broughton-site to perform its functions; however, in this deployment stage, thetraining may not have been sufficient, or software updates may beneeded. To provide such, it is again necessary to develop the trainingoff-site or develop software patches off-site, both of which are costly,timely and which results in an inefficient use of the system, itself.

SUMMARY

In a first aspect of the present disclosure, a method comprises:extracting, using a computing device, features of a plurality ofobjects; training, using the computing device, machine learning modelswith selected ones of the extracted features; building, using thecomputing device, a final machine learning model of the selectedfeatures after all of the plurality of objects for training arecaptured; and performing, using the computing device, an action onsubsequent objects based on the trained final machine learning model.

In a further aspect of the present disclosure, there is a system whichcomprises a processor, a computer readable memory, one or more computerreadable storage media, and program instructions collectively stored onthe one or more computer readable storage media, the programinstructions executable to: receive captured images, data, and featuresof a plurality of objects from a sensor; extract selected features fromthe captured images; train a machine learning model with the selectedcaptured and extracted features; build a final machine learning model ofthe selected features after training from the plurality of objects iscompleted; and perform an action on subsequent objects based on thetrained final machine learning model.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is described in the detailed description whichfollows, in reference to the noted plurality of drawings by way ofnon-limiting examples of exemplary embodiments of the presentdisclosure.

FIG. 1A shows an overview of the training system in accordance withaspects of the present disclosure.

FIG. 1B shows an overview of a fixed line scan camera implemented in thesystem in accordance with aspects of the present disclosure.

FIG. 1C shows an overview of a fixed area scan camera implemented in thesystem in accordance with aspects of the present disclosure.

FIG. 1D shows an overview of a mobile line scan camera implemented inthe system in accordance with aspects of the present disclosure.

FIG. 1E shows an overview of a mobile area scan camera implemented inthe system in accordance with aspects of the present disclosure.

FIG. 2 shows an exemplary computing environment in accordance withaspects of the present disclosure.

FIG. 3 shows a block diagram using a batch training process inaccordance with aspects of the present disclosure.

FIG. 4 depicts an exemplary flow using a batch training process with afixed camera in accordance with aspects of the present disclosure.

FIG. 5 depicts an exemplary flow using a batch training process with amoving camera in accordance with aspects of the present disclosure.

FIG. 6 shows a block diagram using a mixed training process inaccordance with aspects of the present disclosure.

FIG. 7 depicts an exemplary flow using a mixed training process with afixed camera in accordance with aspects of the present disclosure.

FIG. 8 depicts an exemplary flow using a mixed training process with amoving camera in accordance with aspects of the present disclosure.

DETAILED DESCRIPTION

The present disclosure relates to a training system and, moreparticularly, to a method and system for training on objects to beclassified and related processes. In accordance with aspects of thepresent disclosure, the system for training can be implemented with anon-site teachable or trainable classification machine implemented usingmachine learning and computer vision, both of which are used to train onobjects to be classified by the systems. Advantageously, the approachdescribed herein will greatly speed up the development and installationof classification systems, such as sorting machines, as the training canbe performed directly by the user of the machine, on-site.

In more specific embodiments, the present disclosure is directed tosystems and processes that can be used to capture training data, labelthem, and perform training on objects using machine learning models,which use the captured data to produce classification models for objectclassification. Advantageously, the system can be trained by the user,themselves, on-site. So, by implementing the processes described herein,it is possible to train on objects, on site, with differentclassifications of objects. After the classification is provided, someaction can be taken on the object, e.g., sorting, classifying, countingor determining some other physical characteristics.

For example, typical processes of training machine learning models forclassification of physical objects consists of training andtesting/deployment phases, where training is conducted off-site, andperformed by data scientists or machine learning researchers. Incontrast, the systems and processes described herein allow training andmodeling on-site, and can be used with any user (i.e., a user that doesnot have any background in machine learning). The systems and processesallow for live validation, where after finishing the training phase, avalidation phase is conducted to check the results on new objects thatwere not previously seen. Based on the results, the user might decide toadd more training examples or to stop training and switch to productionmode. The training can be conducted by grouping similar objects inbatches (e.g., batch training) and performing the training on them, orby putting all items into a mixed group (e.g., mixed training) andmanually labeling these mixed items (objects). This can be used with anyclassification tasks, such as classifying fruits, bottles, defectedparts, etc. In embodiments, one or more features of the objects can beused to classify the objects.

FIG. 1A shows an overview of the system in accordance with aspects ofthe present disclosure. In particular, the system 10 includes hardwareparts and software for training and classification of objects. Thesystem 10 includes a vision system 12 and, in embodiments, other inputdata sources 14. The vision system 12 can be various types of imagecapturing devices, including, e.g., gray scale cameras, color cameras,multi-spectral cameras, hyper-spectral cameras, thermal cameras, X rayimaging, ultrasound imaging, and any other imaging devices andmodalities. In embodiments, cameras can be line scan, area scan, orpoint scan cameras, 2D scan or higher dimensional scanners and/or pointcloud through 3d scanning sensors including LIDAR. The other input datasources 14 can be scales (weight), distance sensors, spectrometers, anyother sensor types capable of detecting a characteristic of a physicalobject, and external sources of data about the objects and theenvironment. In embodiments, information obtained from the vision system12 and, in embodiments, other input data sources 14, can include images,size, aspect ratio, color, reflectance, perimeter, texture, weight,temperature, humidity, material composition, point cloud, or otherdesired characteristic of the objects which can be used for categorizingthese objects at a later stage. It should be understood that the imagesand/or data can be captured using a single camera, multiple cameras, asingle sensor or multiple sensors, or combinations thereof.

Still referring to FIG. 1A, the information obtained from the visionsystem 12 and, in embodiments, other input data sources 14 is providedto a computing device or system 100. The computing system 100 includesmachine learning modules and training modules 115 a/115 b, which can beused for training purposes to deploy trained models to an output 16. Asdescribed herein, the output 16 of the computing system 100 can be usedto do various things, such as controlling devices or actuators (e.g.,sorting machine, robotic arms, air pumps, etc.), saving results to adatabase, or triggering other actions, either physical or programmatic,and also either for a local system or external systems. In embodiments,at the training phase, objects can be detected and segmented frombackground before classifying them using various algorithms as describedherein.

FIGS. 1B and 1C show the use of a fixed camera to capture objectinformation (e.g., characteristics of the object) on a conveyor or othersystem 200; whereas, FIGS. 1D and 1E show the use of a moving camera tocapture object information (e.g., characteristics of the object). Inembodiments, the conveyor system 200 can also be representative of asorting machine. Other situations might arise for fix camera scenario,as fixing the system above streets, or rivers, or any path in whichthere are moving objects to classify, all of which are represented atreference numeral 200. It should be understood by those of skill in theart that there are many applications for a fixed camera or sensorsystem, other than just sorting on a conveyor. By way of some examples,a fixed system can include: (i) inspection on conveyor to classifyobjects, such as parts' defects and fruits' grades, etc.; (ii) a fixedsensor or camera above a street to classify moving vehicles (e.g., cars,buses, trucks, motorcycles, etc.); (iii) a fixed sensor or camera abovesome point over a river to classify flowing objects (e.g., boats,animals or birds, debris or plants, etc.); and (iv) a fixed sensor orcamera under moving objects, e.g., for classifying flying airplanes,birds, drones, etc. Another example is using a fixed system (e.g.,camera or sensor) with a fixed object. An application is objectmonitoring and classifying its state, if the state is altered (e.g.,heated objects through friction captured by thermal camera or thermalsensor), the system can provide an alert or turn off the monitoreddevice or provide commands to another system.

It is also noted that the moving body that the system is attached to isnot limited to drones as illustrated here, but is can be any vehicle,drone, or moving robot (bi-pedal, 4, 6, or 8 legged robots, robots ontires, robotic arm, etc.). As should be understood by those of skill inthe art, the above are examples of the moving system, where additionalapplications are contemplated in which the classification device isattached on a moving body to make the classification on fixed objects.By way of some examples, a moving system can include: (i) attach thesystem on a drone and fly it above a field to classify crops such as,e.g., crops types, ripeness, and health; (ii) attach the system on avehicle robot (e.g., on tires), that go through a field to identifyweeds and remove them; (iii) attach the system on a front of movingcar/truck and to classify road defects while driving (e.g., holes andcracks), or to identify garbage on the street; (iv) attach the system ona moving robotic arm that deals with objects (e.g., sorting orassembling), to classify them with the device and deal with themaccordingly. It is also contemplated that there are cases in which thesystem (e.g., sensor and/or camera) is attached to a moving body and theobjects to be classified are also moving. Some examples include: (i) thesystem is attached to a car while driving and classify other cars eithermoving or standing, e.g., used on a police car; (ii) the system might beattached under a fishing boat to classify fish that swim under it. Inthis latter example (ii), it is possible to classify either if there isany fish (e.g., fish or no fish) or classify fish by their type (e.g.,salmon, etc.).

In embodiments, both the fixed camera and moving camera implementationscan be a point scan, line scan camera 12 a (FIG. 1B and FIG. 1D), anarea scan camera 12 b (FIG. 1C and FIG. 1E), or other scanningtechnologies in more dimensions, such as 3d scanning and depth scanning.As should be understood by those of ordinary skill in the art, an areascan camera provides a fixed resolution, which image is in a definedarea; whereas, a line scan builds images using a single pixel row at atime as the object passes through the line with a linear motion. Themoving camera can be implemented with a drone, for example. In all ofthese implementations, the objects can be used for training, using amixed training process; although batch training is also contemplatedherein.

FIG. 2 is an illustrative architecture of a computing system 100implemented as embodiments of the present disclosure. The computingsystem 100 is only one example of a suitable computing system and is notintended to suggest any limitation as to the scope of use orfunctionality of the present disclosure. Also, computing system 100should not be interpreted as having any dependency or requirementrelating to any one or combination of components illustrated in thecomputing system 100.

As shown in FIG. 2, the computing system 100 includes a computing device105. The computing device 105 can be resident on a networkinfrastructure such as local network, remote network, or within a cloudenvironment, or may be a separate independent computing device (e.g., anedge computing device, PC, or workstation). The computing device 105 mayinclude a bus 110, a processor 115, a storage device 120, a systemmemory (hardware device) 125, one or more input devices 130, one or moreoutput devices 135, and a communication interface 140. The bus 110permits communication among the components of the computing device 105.The bus 110 may be any of several types of bus structures including amemory bus or memory controller, a peripheral bus, and a local bus usingany of a variety of bus architectures to provide one or more wired orwireless communication links or paths for transferring data and/or powerto, from, or between various other components of the computing device105.

The processor 115 may be one or more conventional processors ormicroprocessors that include any processing circuitry operative tointerpret and execute computer readable program instructions, such asprogram instructions for controlling the operation and performance ofone or more of the various other components of the computing device 105.The program instructions are also executable to provide thefunctionality of the system including, e.g., detection, segmentation,features extraction and selection, and classification, directly on anedge device, on a device in the local network, or on a remote device ina remote network or on the cloud (each one of which can berepresentative of the computing infrastructure of FIG. 2). By way ofanother example, the program instructions are executable to directlyswitch between training and deployment of an operation mode, toimmediately be used after training. In further embodiments, thecomputing infrastructure can be a handheld device (e.g., phone ortablet) that can contain the system or is part of the system. Forexample, the a camera, sensors, storage, processing, and computationunits, can be gathered in one enclosure (e.g., handheld device or othersingle unit as depicted in FIG. 2) or developed into separate modulesthat are connected within a same location or distributed into manylocations.

Many types of processors can be used, e.g., Central Processing Unit(CPU), Graphics Processing Unit (GPU), AI accelerators,Microcontrollers, Field Programmable Gate Arrays (FPGA), or any otherApplication Specific Integrated Circuit (ASIC). In embodiments, theprocessor 115 interprets and executes the processes, steps, functions,and/or operations of the present disclosure, which may be operativelyimplemented by the computer readable program instructions. By way ofillustration, the processor 115 includes a detection and featureextraction and selection module 115 a and machine learning and trainingmodule 115 b, used to train and deploy the models, e.g. train, validate,and classify objects, as described in more detail below.

In embodiments, the processor 115 may receive input signals from one ormore input devices 130 and/or drive output signals through one or moreoutput devices 135. The input devices 130 may be, for example, akeyboard or touch sensitive user interface (UI) or any of the sensorsdescribed with respect to FIGS. 1A-1E. The output devices 135 can be,for example, any display device, printer, etc., as further describedbelow.

Still referring to FIG. 2, the storage device 120 may includeremovable/non-removable, volatile/non-volatile computer readable media,which is non-transitory media such as magnetic and/or optical recordingmedia and their corresponding drives. The drives and their associatedcomputer readable media provide for storage of computer readable programinstructions, data structures, program modules and other data foroperation of the computing device 105 and training machine learningmodels. In embodiments, the storage device 120 may store operatingsystem 145, application programs 150, and program data 155 in accordancewith aspects of the present disclosure.

The system memory 125 may include one or more storage mediums, which isnon-transitory media such as flash memory, permanent memory such asread-only memory (“ROM”), semi-permanent memory such as random accessmemory (“RAM”), any other suitable type of storage component, or anycombination thereof. In some embodiments, an input/output system 160(BIOS) including the basic routines that help to transfer informationbetween the various other components of computing device 105, such asduring start-up, may be stored in the ROM. Additionally, data and/orprogram modules 165, such as at least a portion of operating system 145,application programs 150, and/or program data 155, that are accessibleto and/or presently being operated on by processor 115 may be containedin the RAM.

The one or more input devices 130 may include one or more mechanismsthat permit an operator to input information to computing device 105,such as, but not limited to, a touch pad, dial, click wheel, scrollwheel, touch screen, one or more buttons (e.g., a keyboard), mouse, gamecontroller, track ball, microphone, camera, proximity sensor, lightdetector, motion sensors, biometric sensor, or any of the sensorsalready described herein (e.g., as shown and described with respect toFIGS. 1A-1E) and combinations thereof. The one or more output devices135 may include one or more mechanisms that output information to anoperator, such as, but not limited to, audio speakers, headphones, audioline-outs, visual displays, antennas, infrared ports, tactile feedback,actuators, other computing devices, databases, printers, or combinationsthereof.

The communication interface 140 may include any transceiver-likemechanism (e.g., a network interface, a network adapter, a modem,cellular network (such as LTE, 2G, 3G, 4G, and 5G), or combinationsthereof) that enables computing device 105 to communicate with remotedevices or systems, such as a mobile device or other computing devicessuch as, for example, a server in a networked environment, e.g., localnetwork, remote network, or cloud environment. For example, thecomputing device 105 may be connected to remote devices or systems viaone or more local area networks (LAN) and/or one or more wide areanetworks (WAN) using the communication interface 140, either wired orwireless. In addition, the system can use other types of connections,such as firewire, parallel port, serial port, PS/2 port, USB port (anyversion of it), and thunderbolt port.

As discussed herein, the computing system 100 may be configured andtrained to provide a model for the objects which are train upon. Themodel can then be used to classify subsequent objects that are detectedby the sensors. In particular, the computing device 105 may performtasks (e.g., process, steps, methods and/or functionality) in responseto the processor 115 executing program instructions contained in acomputer readable medium, such as system memory 125. The programinstructions may be read into system memory 125 from another computerreadable medium, such as data storage device 120, or from another devicevia the communication interface 140 or a server within a local or remotenetwork, or within a cloud environment.

By way of more specific example and using the computing system 100described herein, a training phase can be conducted as described next.At the training phase, object detection can be considered in twoseparate situations: (i) if the objects are manually selected (such asin a mixed training situation), e.g., already detected, and no furtherprocessing is needed for detection; and (ii) if objects of similarclasses are presented (e.g., as in batch training process where theobjects have similar features (e.g., all red apples or all green apples,etc.)). In the latter case, the objects are detected automatically andseparated from the background using feature extraction techniques knownto those of skill in the art, e.g., using known object detectionalgorithms. Another method for object detection for the latter case isto use external triggers that are connected to the system to trigger itto capture objects upon arrival, such as infrared triggers. Thecomputing system 100 can interact with different systems and interfaceby obtaining the data, sending the data, getting control or triggersignals, or sending control or trigger signals.

In one example, image processing algorithms can be used if thebackground is of homogeneous texture, intensity, or color that can beeasily distinguished from the objects. Such algorithms can include edgedetection and contour detection algorithms, or algorithms based oncolors and texture segmentation. In a more challenging situation, moreadvanced classification algorithms can be used for object detection,such as Histogram of Oriented Gradients (HOG), spectral and waveletmethods, and deep learning algorithms, e.g., Yolo and RetinaNet (any oftheir versions), SPP-Net, Feature Pyramid Networks, Single shot detector(SSD), or Faster R-CNN (or any of its preceding versions). Thesealgorithms can detect objects, even when the background might providesome noise or interference in detecting the object.

Feature extraction is done prior to the classification, which can beconducted by the feature extraction and selection module 115 a, in whichall feature types are first decided upon and then extracted (using thesensors as described in FIGS. 1A-1E). In this case, the best applicablefeatures are selected in a feature selection phase. The featureselection can be implemented by way of algorithms under filter, wrapper,or embedded methods. Such algorithm include, e.g., forward selection,backward selection, correlation-based feature recursive featureelimination, Lasso, tree-based methods, and genetic algorithm. Also,projection algorithms of the feature selection module 115 a can be usedfor feature reduction such as Principal Component Analysis (PCA), LinearDiscriminant Analysis (LDA), Mixture Discriminant Analysis (MDA),Quadratic Discriminant Analysis (QDA), and Flexible DiscriminantAnalysis (FDA), where features are projected into a lower dimensionspace. Some algorithms might combine feature extraction andclassification, as in deep learning algorithms as should be known in theart such that no further explanation is required.

In embodiments, the extracted features can be classified under thefollowing categories:

(i) Shape features, e.g., size, perimeter, area, chain codes, Fourierdescriptors, shape moments;

(ii) Texture features which can be implemented using Local BinaryPatterns (LBP), Gabor filter features, Haralick texture feature, andGLCM (GLCM is a histogram of co-occurring greyscale values at a givenoffset over an image. For example, samples of two different textures canbe extracted from a single image), features extracted from GLCM, etc.;and

(iii) Color and intensity features using, e.g., color moments, and colorhistograms.

In addition to image features, other features can be utilized such asweight, temperature, humidity, depth, point cloud, material composition,and dimensions if taken by, e.g., laser sensors or other sensors.

Also, features and data from external sources can be utilized intraining the models and classification, such as weather data, and GPSlocation, as an illustrative examples. In embodiments, the data fromexternal sources can be used to augment classification capabilityincluding weather and GPS data, wherein the data can be used in atraining phase or deployment phase.

After feature extraction and selection, a classification model istrained using the classification module (e.g., machine learning) 115 b.In embodiments, the classification module 115 b can use any multi-classclassification algorithms, e.g., logistic regression, decision tree,Support Vector Machines (SVM), Naive Bayes, Gaussian Naive Bayes,k-Nearest Neighbors (kNN), K-Means, Expectation Maximization (EM),reinforcement learning algorithms, Artificial Neural Networks, deeplearning algorithms (e.g., Convolutional Neural Networks (CNN),Recurrent Neural Networks (RNNs), Long Short-Term Memory Networks(LSTM), Stacked Auto-Encoders, Deep Boltzmann Machine (DBM), and DeepBelief Networks (DBN), etc. In further embodiments, it is possible totrain multiple classifiers using the classification module (e.g.,machine learning) 115 b, e.g., with different algorithms in an ensemblemethod.

For example, an ensemble of classifiers can consist of two options,either construct an ensemble consisting of several classifiers of thesame type (algorithm) or construct an ensemble consisting of severalclassifiers of two or more types. Using ensembles usually results inmore accurate results. There are several types of ensemble methods, suchas bagging and boosting. Examples of such algorithms are random forest,Adaboost, gradient boosting algorithms, XGBoost, and Gradient BoostingMachines (GBM). In embodiments, these methods are used as one classifierfor each, where it is possible to stack different classifiers andcombine their outputs to obtain the final classification.

In further embodiments, to obtain the final output for ensemble methods,there are different techniques for voting including majority voting anda training voting classifier. In the majority voting, the classificationselected is by a majority vote of the classifiers' outputs. In thetraining voting classifier, it is possible to train a classifier, whereits input is the output of different classifiers and its output is thefinal classification. The types and numbers of stacked classifiers caneither be set manually or a search algorithm can select them (but itwill take much longer time for training).

In another example, and for illustrative purposes, the training can be aclassical approach for machine learning as noted by the previouslydiscussed algorithms. In further embodiments, a deep learning approachfor object classification, which can be used either independently, orwithin an ensemble of various classifiers as described above. As isunderstood by one of skill in the art, deep learning is a collection ofmachine learning algorithms based on neural networks; however, trainingdeep learning models need huge amount of data and very powerfulmachines, and the training takes a very long time (in weeks or monthsfor big models with millions of images in training set).

To account for these shortcomings, the classification module 115 b canuse several techniques to obtain quicker results based on pre-trainedmodels, such as using transfer learning using available pre-trainedmodels (e.g., on ImageNet, Common Objects in Context (COCO), andGoogle's Open Images), that can be used as a base model. For objectsclassification in images, it is contemplated to use Convolutional NeuralNetworks (ConvNets). ConvNets can be used with transfer learning forobject recognition in different ways:

Features extraction: the base model is used as it is, only theclassification layer (the final layer in the network is removed). Theoutput of the network without the final layer will give unique featuresfor any input image in a fixed size vector. Using this, it is possibleto extract the features for all training images and use them to train asimpler classifier from the previously mentioned machine learningalgorithm, such as logistic regression, SVM, decision trees, or randomforest.

Fine-tuning: the base model is used but is adapted to the new trainingdataset. This is done by freezing the whole neural network except thefinal few layers. Then, during the training, only the non-frozen layersare trained while the remaining of the network is not changing. In thisway, it is possible to use the rich features from the training of themillions of images from the base model and adapt the last layers to thespecific images in the set. A more specific form is just replacing thefinal layer responsible of classification with a new layer containingthe new number of classes at hand and train the network with the newimages.

In further embodiments, a Resnet architecture can be implemented forimage classification. Other architectures are also contemplated such asLeNet, AlexNet, VGG, ZFNet, Network in Network, Inception, Xception,ResNet, ResNeXt, Inception-ResNets, DenseNet, FractalNet, CapsuleNet,MobileNet, any of their versions, or any other architectures, by using apre-trained base classifier for them. Also, detector/classificationarchitectures can be used that combine the detection and classification,such as Yolo and RetinaNet (any of their versions), SPP-Net, FeaturePyramid Networks, Single shot detector (SSD), or Faster R-CNN (or any ofits preceding versions). In still further embodiments, more advancedalgorithms and techniques for architecture search and Auto ML can beused to find the best architecture and training without hardcodedarchitecture type and parameters.

The accuracy of each classifier will be calculated on a validation setto assess its performance using, e.g., processor 115. Then the best one(or set of classifiers if using ensemble) will be used. The validationset can be obtained by splitting the training data into training andvalidation sets, either with a fix proportion (e.g., 60% training and40% validation, 70%-30%, 80%-20%, or other configurations), or usingk-cross-validation, in which the dataset is split into k parts, and thetraining is conducted k times, each time selecting one part as thevalidation set and the remaining as the training set, then average theresult of the k trained classifiers. In embodiments, an Fn-score is usedto assess the accuracy, which is the harmonic mean of precision andrecall. Alternatively, it is contemplated to use either precision,recall, specificity, or any other accuracy metrics, such as Area UnderROC (receiver operating characteristic), or a combination of severalmetrics. In addition to accuracy, the time to classify each sample willbe recorded. This will help to make the trade-off between speed andaccuracy if the speed is an important factor. The user should specifythis, and the system will determine the suitable algorithms based onrecorded time and accuracy for each classifier.

FIG. 3 shows a block diagram using batch training process in accordancewith aspects of the present disclosure. More specifically, FIG. 3 showsa batch training process using objects having similar characteristicsusing either or both a line scan camera and an area camera. Inembodiments, FIG. 3 can be representative of a fixed system or a movablesystem (e.g., camera or other sensor); that is, in FIG. 3, the objectscan be moving with respect to a fixed system (e.g., camera or othersensor) or the system (e.g., camera or other sensor) can be moving withrespect to the fixed objects.

In FIG. 3, the batches of similar objects (objects with similarcharacteristics) are provided in threes batches of different classes,e.g., objects with characteristics. The number of classed can be two (2)or more according to the specific application. For example, thedifferent characteristics of the objects are, e.g., square (class 1),triangle (class 2) and round (class 3). Other characteristics can becollected through various sensors. It should be understood by those ofskill in the art that the characteristics can be representative of anyphysical characteristic such as, e.g., weight, color descriptors, shapedescriptors, texture descriptors, temperature, humidity, depth, pointcloud, material composition, etc., as discussed previously. Thesebatches of objects can then be used to train a model for future actionon objects of similar characteristics as already noted herein andfurther described with respect to the flow of FIGS. 4 and 5.

FIG. 4 depicts an exemplary flow using a batch training with a fixedsystem. Specifically, at step 400, a user will create training batches,with each batch representative of a specific class of objects. Forexample, in a farming situation, the user may create separate batches ofgreen apples, yellow apples and red apples; although other criteria orcharacteristics may be used such as weight, size, texture, etc. At step405, each batch is separately put on a conveyor (or each batch isseparately moved past the sensor or camera in some other manner) and, atstep 410, the camera will acquire the images for each batch. It shouldbe understood, though, that this step may include obtaining objectcharacteristics (e.g., features) with other sensor types, as describedherein. It should be noted also that other situations might arise forthe fix system scenario, as fixing the system above streets, or rivers,or any path in which there are moving objects to classify, or fixing thesystem below moving objects, such as to detect drones or flying birds.The image acquisition also might include segmenting or separating theimage from its background before classifying them using variousalgorithms as described herein. At step 415, the features of thecaptured images are extracted. The extracted features can include, as inall of the embodiments in a feature selection phase, best applicablefeatures (e.g., unique object characteristics that can be readilydiscernable or classified). At step 420, the extracted features are usedto train a machine learning model. At step 425, after the training, theprocesses will provide a final machine learning model, which model cannow be used to take an action on other objects. Some algorithms mightcombine feature extraction and classification, as in deep learningalgorithms.

FIG. 5 depicts an exemplary flow using a batch training process with amoving system. Specifically, at step 500, a user will create trainingbatches, with each batch representative of an object class. For example,in a farming situation, the user may create separate batches of greenapples, yellow apples and red apples; although other criteria orcharacteristics may be used such as weight, size, texture, etc. At step505, each batch is placed in a separate area or region. Alternatively,each batch can be identified in a specific region using, e.g., GPSmethodologies. At step 510, the camera (or other sensor) will acquirethe images (or other characteristics) for each object in the batch at orin the specific area or region. It should be noted that moving body thatthe system is attached to can comprise many moving systems, such as anyvehicle, drone, or moving robot (bi-pedal, 4, 6, or 8 legged robots,robots on tires, robotic arm, etc.), or handheld device of any type,e.g., phone or tablet.

The image acquisition might include segmenting the image from itsbackground before classifying them using various algorithms as describedherein. At step 515, the features of the captured images are extracted.At step 520, the extracted features are used to train a machine learningmodel. At step 525, after the training, the processes will provide afinal machine learning model, which model can now be used to take anaction on other objects. Some algorithms might combine featureextraction and classification, as in deep learning algorithms.

FIG. 6 shows a block diagram using mixed training process in accordancewith aspects of the present disclosure. In embodiments, FIG. 6 can berepresentative of a fixed system or a movable system (e.g., camera orother sensor); that is, in FIG. 6, the objects can be moving withrespect to a fixed system (e.g., camera or other sensor) or the system(e.g., camera or other sensor) can be moving with respect to the fixedobjects.

More specifically, FIG. 6 shows a mixed training process using objectshaving dissimilar characteristics using either or both a line scancamera and an area camera. In FIG. 6, the batches of dissimilar objectsare labeled by the operator as they are imaged, e.g., train on theobjects. Alternatively, the images and data are saved and labeledoff-line by the operator. The labeling process might also be done oneither on a local machine, a machine in the local network, a remoteserver, or the cloud, by the operator(s) or other party. As in any ofthe scenarios, it should be understood that the more training performed,e.g., labeling, the better the set will be for honing in on thedifferent subtleties that there might be in order to use it in thedeployment stage. These objects can then be used to train a model forfuture action on objects of similar characteristics as already notedherein and further described with respect to the flow of FIGS. 7 and 8.

FIG. 7 depicts an exemplary flow using a mixed training process with afixed system in accordance with aspects of the present disclosure.Specifically, at step 700, the objects are placed on a conveyor by theuser; although as discussed previously, the system can be installed insettings other than conveyor situation. For example, the objects can bemoved separately moved past the sensor or camera in some other manner.In this example, the objects are of a mixed nature, e.g., havingdifferent characteristics. At step 705, the objects are imaged and/orreading from sensors are taken, the operator (user) will label thecaptured objects, e.g., train on the objects. It is also contemplated tolabel data other than images that came from the sensor or other sources.Alternatively, the images or other characteristics can be saved and thenlabeled offline, either by the operators or by other party as discussedpreviously. At step 710, the features of the captured images areextracted. At step 715, the extracted features are used to train amachine learning model. At step 720, after the training, the processeswill provide a final machine learning model, which model can now be usedto take an action on other objects. Some algorithms might combinefeature extraction and classification, as in deep learning algorithms.

FIG. 8 depicts an exemplary flow using a mixed training process with amoving system in accordance with aspects of the present disclosure.Specifically, at step 800, images of the objects (or othercharacteristics) are obtained from different regions or areas by amoving sensor. In this example, again, the objects are of a mixednature, e.g., having different characteristics. At step 805, as theobjects are imaged and/or reading from sensors are taken, the operator(user) will label the captured objects. Alternatively, the images orother characteristics can be saved and then labeled offline, either bythe operators or by other party as discussed previously. The imageacquisition can include segmenting the image from its background beforeclassifying them using various algorithms as described herein. At step810, the features of the captured images are extracted. At step 815, theextracted features are used to train a machine learning model. At step820, after the training, the processes will provide a final machinelearning model, which model can now be used to take an action on otherobjects. Some algorithms might combine feature extraction andclassification, as in deep learning algorithms.

As should now be understood, FIGS. 4, 5, 7 and 8 depict an exemplaryflow for a process in accordance with aspects of the present disclosure.The exemplary flow can be illustrative of a system, a method, and/or acomputer program product and related functionality implemented on thecomputing system of FIG. 2, in accordance with aspects of the presentdisclosure. The computer program product may include computer readableprogram instructions stored on computer readable storage medium (ormedia). The computer readable storage medium includes the one or morestorage medium as described with regard to FIG. 2, e.g., non-transitorymedia, a tangible device, etc. The method, and/or computer programproduct implementing the flow of FIG. 4 can be downloaded to respectivecomputing/processing devices, e.g., computing system of FIG. 2 asalready described herein, or implemented on a cloud infrastructure asdescribed with regard to FIG. 2. The machine learning model training anddeployment can be done either locally or remotely. The system on-sitecan consist of edge devices, PCs, and any type of workstations orcomputing machines. Remote infrastructure might include remote serversor cloud infrastructures, as examples. And, in embodiments, the systemcan be trained on premise at the edge device, personal computer,workstation, or other computation device, as well as trained on a remoteservers/workstations or cloud infrastructure.

The foregoing examples have been provided merely for the purpose ofexplanation and are in no way to be construed as limiting of the presentdisclosure. While aspects of the present disclosure have been describedwith reference to an exemplary embodiment, it is understood that thewords which have been used herein are words of description andillustration, rather than words of limitation. Changes may be made,within the purview of the appended claims, as presently stated and asamended, without departing from the scope and spirit of the presentdisclosure in its aspects. Although aspects of the present disclosurehave been described herein with reference to particular means, materialsand embodiments, the present disclosure is not intended to be limited tothe particulars disclosed herein; rather, the present disclosure extendsto all functionally equivalent structures, methods and uses, such as arewithin the scope of the appended claims.

What is claimed is:
 1. A method comprising: extracting, using acomputing device, features of a plurality of objects; training, usingthe computing device, a machine learning model with selected ones of theextracted features; building, using the computing device, a finalmachine learning model of the selected features after all of theplurality of objects for training are captured; and performing, usingthe computing device, an action on subsequent objects based on theircharacteristics matching the selected features in the final machinelearning model.
 2. The method of claim 1, further comprising capturingthe features using a sensor or plurality of sensors, wherein theselected features are similar characteristics in a batch of objects. 3.The method of claim 1, wherein the training is a batch training processcomprising training on a plurality of similar objects in a batch ofobjects, at a single time and on-site of where the action is performedby a same or another machine.
 4. The method of claim 3, wherein thebatch training process comprising acquiring images and/or data fromsensors of each object in the batch of objects from a specified regionusing a moving camera or sensor, wherein the selected features areextracted from the images.
 5. The method of claim 1, further comprisingcapturing the features using a sensor, wherein the selected features area mix of different objects with different features.
 6. The method ofclaim 5, wherein the training is a mixed training process with a mix ofdifferent object classes, which includes manually labeling the objectsafter they are captured to use them for training.
 7. The method of claim1, further comprising, after finishing the training, validating resultsof the final machine learning model on new objects that were notpreviously captured.
 8. The method of claim 1, wherein the features arecaptured by a fixed or moving sensor which captures the features of theplurality of objects.
 9. The method of claim 1, wherein, at thetraining, the plurality of objects may be separated from theirbackground using image processing techniques, before extracting featuresand classifying of the plurality of objects using the features.
 10. Themethod of claim 1, wherein the training uses multi-class classificationalgorithms.
 11. The method of claim 1, wherein the training isimplemented with a single classifier or an ensemble of classifiers. 12.A system comprising: a processor, a computer readable memory, one ormore computer readable storage media, and program instructionscollectively stored on the one or more computer readable storage media,the program instructions executable to: receive captured images, data,and features of a plurality of objects from a sensor; extract selectedfeatures from the captured images; train a machine learning model withthe selected captured and extracted features; build a final machinelearning model of the selected features after training from theplurality of objects is completed; and perform an action on subsequentobjects based on the trained final machine learning model.
 13. Thesystem of claim 12, wherein the action to be performed in a classifyingof the subsequent objects based on the trained final machine learningmodel.
 14. The system of claim 12, wherein the training use pre-traineddeep learning models including using feature extraction and transferlearning.
 15. The system of claim 12, wherein the system is trained onpremise at an edge device, personal computer, workstation, or othercomputation device.
 16. The system of claim 12, wherein the system istrained on a remote servers/workstations or cloud infrastructure. 17.The system of claim 12, wherein the program instructions are executableto provide detection, segmentation, features extraction and selection,and classification, directly on an edge device, on a device in the localnetwork, or on a remote device in a remote network or on the cloud. 18.The system of claim 12, wherein the program instructions are executableto directly switch between training and deployment of an operation mode,to immediately be used after training.
 19. The system of claim 12,further comprising manually labeling features of the objects which havedifferent characteristics on-site or off-site, either by operators oranother party.
 20. The system of claim 12, wherein the captured imagesare captured by image capturing devices, including at least one of grayscale cameras, color cameras, multi-spectral cameras, hyper-spectralcameras, thermal cameras, X ray imaging, and ultrasound imaging.
 21. Thesystem of claim 12, wherein the capturing is performed by sensors tocapture desired characteristic of the objects including images, size,aspect ratio, color, reflectance, perimeter, texture, weight,temperature, humidity, and/or material composition.
 22. The system ofclaim 12, further comprising using data from external sources to augmentclassification capability including weather and GPS data wherein thedata is used in a training phase or deployment phase.
 23. The system ofclaim 12, further comprising actuators for actions to be performed onthe objects after the training.
 24. The system of claim 12, wherein theactions are programmatically provided by saving to a database or sendingalerts, triggers, or commands to another system.
 25. The system of claim12, further comprising interacting with different systems and interfacesby obtaining the data, sending the data, getting control or triggersignals, or sending control or trigger signals.
 26. The system of claim12, wherein the system is either installed in a fixed location or onmoving bodies.
 27. The system of claim 26, wherein the fixed system isfixed on top of a way that has moving objects or below a way having themoving objects.
 28. The system of claim 27, further comprising capturingthe data using a moving system attached to moving bodies including anyvehicle, drone, or robot.
 29. The system of claim 12, further comprisinghandheld devices which contain the system or is part of the system. 30.The system of claim 12, wherein the objects to be classified are fixedor moving objects.
 31. The system of claim 12, wherein single ormultiple features are used to classify the objects. and theclassification is provided by using a single classifier or an ensembleof classifiers.
 32. The system of claim 31, further comprising manuallyconfigured classifier algorithms, or automatic algorithms selected fromclassifiers based on accuracy and speed.
 33. The system of claim 12,wherein the images and/or data are captured using a single camera,multiple cameras, a single sensor or multiple sensors, or combinationthereof.
 34. The system of claim 12, further comprising at a camera,sensors, storage, processing, and computation units, which are gatheredin one enclosure or developed into separate modules that are connectedwithin a same location or distributed into many locations.